Skip to content

tests: fix parametrize patterns rejected by pytest 9.1.0#2212

Merged
rwgk merged 1 commit into
NVIDIA:mainfrom
leofang:leofang/fix-pytest9-collection-errors
Jun 14, 2026
Merged

tests: fix parametrize patterns rejected by pytest 9.1.0#2212
rwgk merged 1 commit into
NVIDIA:mainfrom
leofang:leofang/fix-pytest9-collection-errors

Conversation

@leofang

@leofang leofang commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

main has been red since pytest 9.1.0 landed on PyPI — every Test linux-* / Test win-* matrix entry fails at pytest collection time, before any actual test runs. Two unrelated latent bugs in our test code, both tolerated by older pytest but rejected by pytest 9.1.0's stricter parametrize validation:

Bug 1: trailing comma in parametrize name (cuda_core)

cuda_core/tests/test_utils.py:151 had:

@pytest.mark.parametrize("in_arr,", _cpu_array_samples())

The , inside the string was a stray. pytest 9 splits names on comma, ends up with one name but 3-tuple values, and fails collection with:

in "parametrize" the number of names (1):
  ['in_arr']
must be equal to the number of values (3):
  (665115599, 23133, 0)

Fix: drop the trailing comma.

Bug 2: indirect=True override of a fixture-level parametrize (cuda_bindings)

cuda_bindings/tests/test_nvfatbin.py has an arch fixture parametrized with params=ARCHITECTURES. Two tests overrode it via @pytest.mark.parametrize("arch", ["sm_80"], indirect=True). pytest 9 now rejects this as:

duplicate parametrization of 'arch'

Fix: extract the CUBIN-building logic from the CUBIN fixture into a _build_cubin(arch) helper, drop the indirect override on the two affected tests, and call the helper directly with "sm_80" (preserving the original intent — those tests intentionally used only sm_80, since target arch "75" must not match the CUBIN's arch).

Backwards compatibility

Both fixes are pytest-version-agnostic — pip pin (pytest>=6.2.4) doesn't need to change. Verified by collecting against three pytest versions (minimal repros, included below for reproducibility):

pytest broken pattern 1 fixed 1 broken pattern 2 fixed 2
9.1.0 collection error clean collection error clean
9.0.2 clean (tolerant) clean clean (tolerant) clean
8.4.2 clean (tolerant) clean clean (tolerant) clean

Reference

Affected CI runs on main:

Same pattern on my open #2210: https://github.com/NVIDIA/cuda-python/actions/runs/27489049015 — 38 Run cuda.core tests failures + 23 Run cuda.bindings tests failures all stem from these two collection errors.

Two latent test-code bugs that older pytest tolerated but pytest 9.1.0
flags as collection errors, breaking every Test job on main since the
pytest 9.1.0 release:

* cuda_core/tests/test_utils.py:151 had a stray trailing comma in the
  `parametrize` name string (`"in_arr,"`). pytest 9 now splits names on
  comma and counts them, mismatching against the multi-element value
  tuples. Drop the comma.

* cuda_bindings/tests/test_nvfatbin.py had two tests using
  `@pytest.mark.parametrize("arch", ["sm_80"], indirect=True)` to
  override the fixture-level `arch` parametrization. pytest 9 now
  rejects this combination as "duplicate parametrization of 'arch'".
  Extract the CUBIN-building logic into a `_build_cubin(arch)` helper,
  drop the indirect override on the two tests, and call the helper
  inline with the hardcoded `"sm_80"` they need. Preserves intent (the
  override existed because target arch "75" must not match the CUBIN's
  arch).

Both fixes are pytest-version-agnostic; verified collecting cleanly
under pytest 9.1.0, 9.0.2, and 8.4.2 with minimal reproductions of
each pattern.
@copy-pr-bot

copy-pr-bot Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module labels Jun 14, 2026

@rwgk rwgk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw ... it can only get better!

@rwgk rwgk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTP-5.5:

No code findings from my review.

The two edits look correct and narrowly scoped:

  • cuda_core/tests/test_utils.py: fixes the stray @pytest.mark.parametrize("in_arr,", ...) name. _cpu_array_samples() supplies one argument per case, so in_arr is the intended single parameter name.
  • cuda_bindings/tests/test_nvfatbin.py: extracts the old CUBIN fixture body into _build_cubin(arch), keeps the fixture behavior unchanged, and lets the two mismatch tests build only sm_80 without re-parametrizing the existing arch fixture.

Operationally, I would not call the PR merge-ready until full CI runs. Right now the visible checks only include path-label/restricted-path/metadata checks plus pre-commit.ci, and the copy-pr-bot comment says the PR still needs validation before NVIDIA runner workflows can run. Code-wise this looks ready to test; process-wise it still needs the full CI trigger and a green run before merging.

@rwgk

rwgk commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

/ok to test fadd5bd

@github-actions

This comment has been minimized.

@rwgk rwgk marked this pull request as ready for review June 14, 2026 18:30
@rwgk rwgk enabled auto-merge (squash) June 14, 2026 18:30
@rwgk rwgk added this to the cuda.bindings next milestone Jun 14, 2026
@rwgk rwgk added the P0 High priority - Must do! label Jun 14, 2026
@rwgk rwgk merged commit a9156b6 into NVIDIA:main Jun 14, 2026
108 of 110 checks passed
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang leofang deleted the leofang/fix-pytest9-collection-errors branch June 15, 2026 13:48
leofang added a commit that referenced this pull request Jun 16, 2026
Backport of #2212, scoped down to the cuda_bindings/tests/test_nvfatbin.py
portion that applies to 12.9.x. The cuda_core/tests/test_utils.py portion
of #2212 (the trailing-comma-in-parametrize-name fix) does not apply here
because the 12.9.x version of that test file does not have the bug — its
parametrize uses two names matching tuple values.

What is fixed (verbatim from #2212):

  cuda_bindings/tests/test_nvfatbin.py had two tests using
  @pytest.mark.parametrize("arch", ["sm_80"], indirect=True) to override
  the fixture-level `arch` parametrization. pytest 9.1.0 now rejects this
  combination as "duplicate parametrization of 'arch'". Extract the
  CUBIN-building logic into a _build_cubin(arch) helper, drop the indirect
  override on the two tests, and call the helper inline with the
  hardcoded "sm_80" they need. Preserves intent (the override existed
  because target arch "75" must not match the CUBIN's arch).

Closes #2226. Hunk body verified identical to the corresponding hunk in
#2212 (commit a9156b6).
leofang added a commit to leofang/cuda-python that referenced this pull request Jul 1, 2026
Two nightly failure fixups after the first green iteration:

nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard
that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5
removed that name entirely, so test collection fails with
"AttributeError: module 'numpy' has no attribute 'row_stack'". Cap
numpy to <2.5. See NVIDIA/numba-cuda-mlir#154.

nightly-cuda-core: released cuda-core v1.0.1's test suite uses a
parametrize argvalues pattern that pytest 9.1 rejects
("in parametrize the number of names (1)... must be equal to the
number of values (3)"). The main-side fix was NVIDIA#2212 but it has not
shipped in a cuda-core release yet. Cap pytest to <9.1 for the
released-cuda-core test run only.
leofang added a commit that referenced this pull request Jul 2, 2026
* CI: add nightly-cuda-core and nightly-numba-cuda-mlir modes

nightly-cuda-core: test the released cuda-core from PyPI against
main-built pathfinder and cuda-bindings, catching the "core released ×
bindings main" gap documented in issue #1955. Runs on linux-64 (a100)
and win-64 (a100 MCDM).

nightly-numba-cuda-mlir: MLIR-backend companion to nightly-numba-cuda.
Installs main pathfinder+bindings+core plus numba-cuda-mlir from PyPI,
runs numba-cuda-mlir's own test suite from the matching git tag.
Linux amd64/arm64 x CUDA 12.9.1 / 13.3.0.

Both modes fetch the released version's tests from git tags because
the respective wheels do not ship test_*.py files. Includes
tag-not-found fallback (log warning + exit 0) to avoid red-lining the
nightly on a freshly-cut PyPI release that hasn't been pushed to git
yet.

* ci/test-matrix.yml: fix CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM typo

The two ENV overrides intended to exercise the per-thread default
stream code path were misspelled (missing the CUDA_ segment), so the
env var was silently ignored and the PTDS coverage added in #1972 had
no effect. Rename to the correct
CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM.

Refs #971.

* cuda_pathfinder: pin nvshmem to <3.7 (was previously excluding only 3.7.0)

nvidia-nvshmem-cu{12,13} 3.7.x breaks the main branch, not only 3.7.0. Widen the exclusion from an exact-version bump to <3.7 so 3.7.x and above are avoided until we can move forward.

* nightly-numba-cuda-mlir: swap arm64 for win-64 coverage, use rtxpro6000

Drop the linux-aarch64 rows and instead add win-64 coverage with the
same CUDA 12.9.1 / 13.3.0 pair. Switch all four rows from GPU l4 to
rtxpro6000. Windows rows use DRIVER_MODE MCDM, matching the existing
rtxpro6000 CUDA 13.3.0 patterns.

* Temporarily add push trigger to ci-nightly.yml for testing

Remove before merging.

* CI: switch nightly-{cuda-core,numba-cuda-mlir} to actions/checkout for tests

The initial approach used git inside the ubuntu:24.04 container to fetch
the released version's test suite, but git is not installed on that
container (install_unix_deps only pulls in jq/wget/g++/etc.) and its
absence made the run steps silently skip via the tag-not-fetchable
fallback. On Windows, git archive of just the cuda_core subtree also hit
a dangling-symlink extraction failure (cuda_core/.git_archival.txt).

Refactor to:

- run-tests: just install wheels and expose the resolved release version
  (CUDA_CORE_RELEASED_VER / NUMBA_CUDA_MLIR_VER) and cuda-core test-group
  name via GITHUB_ENV. No more git operations.
- test-wheel-{linux,windows}.yml: add an actions/checkout step per mode
  that pulls the matching release tag into a subdirectory
  (cuda-core-released / numba-cuda-mlir-released), then the follow-up
  test step installs that tag's test dep-group and runs pytest.

For numba-cuda-mlir also pass --ignore=tests/benchmarks
--ignore=tests/doc_examples to pytest: those directories import the
`numba` package at module top and would fail collection, which is
cuSIMT's expected behavior (see NVIDIA/numba-cuda-mlir#136 — cuSIMT
intentionally does not depend on numba).

* CI: pin numpy<2.5 (mlir) and pytest<9.1 (cuda-core released tests)

Two nightly failure fixups after the first green iteration:

nightly-numba-cuda-mlir: numba-cuda-mlir 0.4.0 has an inverted guard
that registers an overload of np.row_stack on NumPy 2.x, and NumPy 2.5
removed that name entirely, so test collection fails with
"AttributeError: module 'numpy' has no attribute 'row_stack'". Cap
numpy to <2.5. See NVIDIA/numba-cuda-mlir#154.

nightly-cuda-core: released cuda-core v1.0.1's test suite uses a
parametrize argvalues pattern that pytest 9.1 rejects
("in parametrize the number of names (1)... must be equal to the
number of values (3)"). The main-side fix was #2212 but it has not
shipped in a cuda-core release yet. Cap pytest to <9.1 for the
released-cuda-core test run only.

* CI: deselect known pre-existing failures in nightly-cuda-core and nightly-numba-cuda-mlir

Applied only in the affected nightly-* pytest invocations; the released
source trees under test are unmodified.

nightly-numba-cuda-mlir (all 10 tests deselected are from cuSIMT):

  * CudaArraySetting::{test_no_sync_default_stream, test_no_sync_supplied_stream, test_sync}
    TestCudaArrayInterface::{test_consume_no_sync, test_consume_sync,
                             test_launch_no_sync, test_launch_sync,
                             test_launch_sync_two_streams, test_fortran_contiguous}
      Serial-pytest contamination of numba_cuda_mlir.cuda.cudadrv from an
      xfailed test in test_nrt_comprehensive.py. Upstream CI runs with
      `pytest -n auto --dist loadscope`, which isolates the offending
      side effect in a separate xdist worker; our nightly runs serially
      and hits the pollution. See NVIDIA/numba-cuda-mlir#135.
  * TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn
      Subprocess-invokes `cuobjdump`, which isn't on PATH in the base
      ubuntu:24.04 container. Filed as an upstream skip-guard bug.

nightly-cuda-core (3 tests deselected are pre-existing v1.0.1 issues):

  * test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion]
      Expected drift: main cuda-bindings adds NvlinkVersion.VERSION_6_0
      which v1.0.1's wrapper mapping predates. This mode intentionally
      pairs released core with main bindings, so this coverage-style
      test will stay red here until a cuda-core release catches up.
  * test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive
      Environment-dependent test: expects rlcompleter to crash without
      the tab-completion patch, but on Windows MCDM the pre-patch
      behavior is clean. Passes on Linux, fails on Windows MCDM.
  * test_memory.py::test_non_managed_resources_report_not_managed[pinned]
      Same underlying "Failed to allocate memory from pool" error that
      v1.0.1 already xfails in the sibling test_pinned_memory_resource_initialization
      (TODO(#9999)). cuda-python main has since fixed the parametrized
      case to route through _allocate_pinned_buffer_or_xfail(), but that
      fix hasn't shipped in a cuda-core release yet.

* CI: tighten deselects to per-platform failing sets

Previously applied the same list on both Linux and Windows workflows,
which over-deselected — some tests only fail on one platform because
the underlying issues (serial-pytest test-order in mlir, MCDM-only
behavior in cuda-core) are platform-specific.

Now:

nightly-numba-cuda-mlir
  linux-64: TestCudaArrayInterface::{test_consume_no_sync,
    test_consume_sync, test_launch_no_sync, test_launch_sync,
    test_launch_sync_two_streams, test_fortran_contiguous}
    + TestLinkerDumpAssembly::test_nvjitlink_jit_with_linkable_code_lto_dump_assembly_warn.
  win-64: CudaArraySetting::{test_no_sync_default_stream,
    test_no_sync_supplied_stream, test_sync}
    + TestCudaArrayInterface::test_fortran_contiguous.

Test-order contamination in numba-cuda-mlir#135 surfaces different
tests depending on collection order (linux-64 vs win-64 exercise
different subsets), so the per-platform lists differ. cuobjdump-based
TestLinkerDumpAssembly only fires on Linux because the ubuntu:24.04
container's PATH lacks cuobjdump; Windows runners ship it with the
local CTK.

nightly-cuda-core
  linux-64: test_enum_coverage.py::test_wrapper_covers_all_binding_members[NvlinkVersion].
  win-64: NvlinkVersion (same as Linux)
    + test_rlcompleter_patch.py::test_opt_out_env_var_disables_patch_even_when_interactive
    + test_memory.py::test_non_managed_resources_report_not_managed[pinned].

rlcompleter and pinned mempool tests only fail on Windows MCDM.
NvlinkVersion fails on both (expected drift for the mode).

* CI: version-gate the nightly-mode deselects so they auto-clean

Each deselect is now wrapped in a bash conditional keyed on the
installed release version. When a newer numba-cuda-mlir or cuda-core
release ships with the referenced fix, the nightly picks it up
automatically, the guard evaluates false, and the deselect drops — so
the tests run against the new release. If they still fail we hear
about it loudly rather than silently masking a regression.

Current guards:
- numba-cuda-mlir #135 tests + cuobjdump TestLinkerDumpAssembly:
  applied when installed numba-cuda-mlir version <= 0.4.0.
- cuda-core NvlinkVersion / rlcompleter opt-out / pinned mempool:
  applied when installed cuda-core version <= 1.0.1.

Structure keeps one conditional block per (mode, platform) with a
comment above each deselect explaining the tracking issue.

* CI: broaden mlir deselect list to full #135 union across platforms

The previous per-platform-tight lists were incomplete: NVIDIA/numba-cuda-mlir#135's
import-time contamination poisons whichever tests reference
cuda.cudadrv.driver AFTER the polluting xfail runs, and collection
order varies between runs. Two consecutive Windows CI runs failed on
different subsets (3 slicing tests one run, 5 interface tests the
next).

Deselect the full union of #135-listed tests + test_fortran_contiguous
(observed to hit the same contamination) on both Linux and Windows.
Same version guard (<= 0.4.0) still applies, so the whole block drops
automatically when a newer numba-cuda-mlir release ships with the
root-cause fix.

Linux keeps the extra cuobjdump deselect (Linux-only environment
issue).

* Revert "cuda_pathfinder: pin nvshmem to <3.7 (was previously excluding only 3.7.0)"

This reverts commit 2a42aa7.

* Revert "Temporarily add push trigger to ci-nightly.yml for testing"

This reverts commit a0ccd19.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants